Computer Science Technical Report Experimental Evaluation of Blocking and Non-Blocking Multithreaded Code Execution
نویسندگان
چکیده
The objective of multithreaded execution models is masking the latency of inter processor communications and remote memory accesses in large-scale multiprocessors. Several such models combine aspects of data ow-like execution with the von Neumann model in an attempt to provide both e cient synchronization (as in the data ow model) and e cient exploitation of program locality (as in the von Neumann model). We refer to these models as data-driven multithreading models. One of the factors that distinguishes these models is the thread execution strategy: A thread can be either non-blocking or blocking. Another factor is the architectural support for dynamic synchronization: The locality present within and among threads can potentially be exploited by a proper storage hierarchy for synchronization store (operand storage). Two storage models have been proposed for data-driven multithreaded execution. One is frame based, in which all the threads belonging to a code-block share one storage segment called frame; the other is framelet based, in which each thread has its own storage segment, called framelet. This article experimentally compares two thread execution models and their related storage models. The rst is a blocking execution model that relies on a scheduler for the allocation of threads to processors and exploits inter thread locality within a code-block. It relies on the frame storage model and assumes a certain amount of compile time data distribution to minimize network accesses. The second is a non blocking execution model in which threads are dynamically scheduled based on data availability. It relies on the framelet storage model and makes no assumptions about the static allocation of data to processors. The experimental evaluation takes into account the impact of the storage hierarchy design on the performance of the two models.
منابع مشابه
Execution And Cache Performance Of A Decoupled Non-Blocking Multithreaded Architecture
In this paper we will present an evaluation of the execution performance and cache behavior of a new multithreaded architecture being investigated by the authors. Our architecture uses non-blocking multithreaded model based on dataflow paradigm. In addition, all memory accesses are decoupled from the thread execution. Data is pre-loaded into the thread context (registers), and all results are p...
متن کاملAnalysis of the I - Structure Software Cache onMulti - Threading Systems
Non-Blocking Multithreaded execution models have been proposed as an eeective means to overlap computation and communication in distributed memory systems without any hardware support. Even with the capability of latency tolerance in these execution models , each remote memory request still incurs the cost of communication interface overhead. We therefore designed and implemented our I-Structur...
متن کاملPerformance Evaluation of a Non-Blocking Multithreaded Architecture for Embedded, Real-Time and DSP Applications
This paper presents the evaluation of a non-blocking, decoupled memory/execution, multithreaded architecture known as the Scheduled Dataflow (SDF). The major recent trend in digital signal processor (DSP) architecture is to use complex organizations to exploit instruction level parallelism (ILP). The two most common approaches for exploiting the ILP are Superscalars and Very Long Instruction Wo...
متن کاملApplication Controlled IPC Synchrony - An Event Driven Multithreaded Approach
Interprocess communication (IPC) is an important phenomenon in distributed computing and operating systems. Microkernels of modern operating systems use synchronous IPC semantics for every individual process. On the other hand, a process may exploit non-blocking IPC semantics. In either case, the controlling mechanism belies in the hand of the underlying operating system. IPC monitors open up f...
متن کاملA Non-blocking Multithreaded Architecture with Support for Speculative Threads
In this paper we provide both a qualitative and a quantitative evaluation of a decoupled multithreaded architecture that uses non-blocking threads. Our architecture is based on simple in-order pipelines and complete decoupling of memory accesses from execution pipelines. We extend the architecture to support thread level speculation using snooping cache coherency protocols. We evaluate the perf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997